The document discusses a vision for the future of scientific publishing where 1) research data is tracked and linked to publications, 2) authors can pull provenance data into papers from workflow systems, and 3) published papers remain connected to the underlying data and heritage. It outlines several needs to achieve this vision including developing workflow tools, authoring tools, metadata standards, repositories, and social change around how scientists manage and share their work.
6. What is the problem?
1. Researchers can’t keep track of their data.
7. What is the problem?
1. Researchers can’t keep track of their data.
2. Data is not stored in a way that is easy for authors.
8. What is the problem?
1. Researchers can’t keep track of their data.
2. Data is not stored in a way that is easy for authors.
3. For readers, article text is not linked to the underlying data.
9. The Vision Work done with Ed Hovy, Phil Bourne,
Gully Burns and Cartic Ramakrishnan
10. The Vision Work done with Ed Hovy, Phil Bourne,
Gully Burns and Cartic Ramakrishnan
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
metadata
metadata
metadata
11. The Vision Work done with Ed Hovy, Phil Bourne,
Gully Burns and Cartic Ramakrishnan
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added
metadata
to a (lab-owned) workflow system.
metadata
metadata
12. The Vision Work done with Ed Hovy, Phil Bourne,
Gully Burns and Cartic Ramakrishnan
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added
metadata
to a (lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which
can pull data with provenance from the workflow tool in the
appropriate representation into the document.
metadata
metadata
Rats were subjected to two
grueling tests
(click on fig 2 to see underlying
data). These results suggest that
the neurological pain pro-
13. The Vision Work done with Ed Hovy, Phil Bourne,
Gully Burns and Cartic Ramakrishnan
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added
metadata
to a (lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which
can pull data with provenance from the workflow tool in the
appropriate representation into the document.
metadata 4. Editing and review: Once the co-authors agree, the
paper is ‘exposed’ to the editors, who in turn expose it to
metadata reviewers. Reports are stored in the authoring/editing
system, the paper gets updated, until it is validated.
Rats were subjected to two
grueling tests
(click on fig 2 to see underlying
data). These results suggest that
the neurological pain pro-
Review
Revise
Edit
14. The Vision Work done with Ed Hovy, Phil Bourne,
Gully Burns and Cartic Ramakrishnan
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added
metadata
to a (lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which
can pull data with provenance from the workflow tool in the
appropriate representation into the document.
metadata 4. Editing and review: Once the co-authors agree, the
paper is ‘exposed’ to the editors, who in turn expose it to
metadata reviewers. Reports are stored in the authoring/editing
system, the paper gets updated, until it is validated.
5. Publishing and distribution: When a paper is
published, a collection of validated information is
exposed to the world. It remains connected to its related
Rats were subjected to two
data item, and its heritage can be traced.
grueling tests
(click on fig 2 to see underlying
data). These results suggest that
the neurological pain pro-
Review
Revise
Edit
15. The Vision Work done with Ed Hovy, Phil Bourne,
Gully Burns and Cartic Ramakrishnan
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added
metadata
to a (lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which
can pull data with provenance from the workflow tool in the
appropriate representation into the document.
metadata 4. Editing and review: Once the co-authors agree, the
paper is ‘exposed’ to the editors, who in turn expose it to
metadata reviewers. Reports are stored in the authoring/editing
system, the paper gets updated, until it is validated.
5. Publishing and distribution: When a paper is
published, a collection of validated information is
exposed to the world. It remains connected to its related
Rats were subjected to two
data item, and its heritage can be traced.
grueling tests
(click on fig 2 to see underlying
6. User applications: distributed applications run on this
data). These results suggest that ‘exposed data’ universe.
the neurological pain pro-
Some other publisher
Review
Revise
Edit
17. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
18. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
19. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights
20. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights
D. Semantic/Linked Data XML repositories.
21. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights
D. Semantic/Linked Data XML repositories.
E. Publishing systems as application servers
22. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights
D. Semantic/Linked Data XML repositories.
E. Publishing systems as application servers
F. Social change: Scientists store, track and annotate their
work.
23. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly tool builders
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights
D. Semantic/Linked Data XML repositories.
E. Publishing systems as application servers
F. Social change: Scientists store, track and annotate their
work.
24. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly tool builders
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements tool builders
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights
D. Semantic/Linked Data XML repositories.
E. Publishing systems as application servers
F. Social change: Scientists store, track and annotate their
work.
25. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly tool builders
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements tool builders
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights standards bodies
D. Semantic/Linked Data XML repositories.
E. Publishing systems as application servers
F. Social change: Scientists store, track and annotate their
work.
26. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly tool builders
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements tool builders
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights standards bodies
D. Semantic/Linked Data XML repositories. publishers
E. Publishing systems as application servers
F. Social change: Scientists store, track and annotate their
work.
27. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly tool builders
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements tool builders
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights standards bodies
D. Semantic/Linked Data XML repositories. publishers
E. Publishing systems as application servers publishers
F. Social change: Scientists store, track and annotate their
work.
28. What is needed to get there?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly tool builders
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements tool builders
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights standards bodies
D. Semantic/Linked Data XML repositories. publishers
E. Publishing systems as application servers publishers
F. Social change: Scientists store, track and annotate their
work. institutes, funding bodies, individuals
31. A. Workflow tools are emerging
http://VisTrails.org
http://MyExperiment.org
32. A. Workflow tools are emerging
http://VisTrails.org
http://MyExperiment.org
http://wings.isi.edu/
33. B. Authoring is a part of doing science
The Knowledge Ecosystem:
Interlocking Cycles of Research
Draw conclusions Draw conclusions
Communicate
Collect data
Collect data
Perform Perform
experiment Gather info experiment
Synthesize
Create/modify Create/modify
hypothesis hypothesis
Slide by Tim Clark
34. B. Authoring ‘ecosystems’: e.g., SWAN
SWAN Semantic Relationships
Excel file describes
Private makes hasEvidence annotates
comment
publication person
Claim
hasEvidence authoredBy authorOf
publication
shareWith
describes
MSWORD file Slide by Tim Clark
35. B. Authoring ‘ecosystems’: e.g., SWAN
person SWAN Semantic Relationships
annotates
comment
authoredBy
makes hasEvidence
concept
annotates
Claim publication
shareWith hypothesis
makes hasEvidence
gene
Claim publication
hasEvidence discussedIn
group
publication
Public Excel file describes describes
PDFs
Private makes hasEvidence annotates
comment
publication person
Claim
hasEvidence authoredBy authorOf
publication
shareWith
describes
MSWORD file Slide by Tim Clark
37. C. Metadata: HCLS SiG Scientific Discourse
http://esw.w3.org/HCLSIG/SWANSIOC:
38. C. Metadata: HCLS SiG Scientific Discourse
http://esw.w3.org/HCLSIG/SWANSIOC:
Project Description
39. C. Metadata: HCLS SiG Scientific Discourse
http://esw.w3.org/HCLSIG/SWANSIOC:
Project Description
Provide a Semantic Web platform for biomedical discourse which
can be evolved over time into a more general facility for many types of
scientific discourse, and which is linked to key biological categories
specified by ontologies.
40. C. Metadata: HCLS SiG Scientific Discourse
http://esw.w3.org/HCLSIG/SWANSIOC:
Project Description
Provide a Semantic Web platform for biomedical discourse which
can be evolved over time into a more general facility for many types of
scientific discourse, and which is linked to key biological categories
specified by ontologies.
Discourse categories should include research questions, scientific
assertions or claims, hypotheses, comments and discussion, experiments,
data, publications, citations, and evidence.
41. C. Metadata: HCLS SiG Scientific Discourse
http://esw.w3.org/HCLSIG/SWANSIOC:
Project Description
Provide a Semantic Web platform for biomedical discourse which
can be evolved over time into a more general facility for many types of
scientific discourse, and which is linked to key biological categories
specified by ontologies.
Discourse categories should include research questions, scientific
assertions or claims, hypotheses, comments and discussion, experiments,
data, publications, citations, and evidence.
Our primary scientific use cases will be derived from problems in
digital scientific communications and web-based research
collaboratories supporting research in neurological disorders and
therapies.
42. C. Metadata: Annotation Ontology
foaf:person rdf:Type
http://www.ht.org/
foaf.rdf#me
June 1, 2010
pav:createdBy
pav:createdOn ann:annotates http://anyurl.com/sf_pat01.html
hasTag
rdf:Type
hasTopic
Tag
Atomic
tag
FMA:skull ann:context
onDocument
Linear skull fracture
rdf:Type
Other annotations on the same document:
1. Atomic annotation on image (tag: “hematoma”)
2. General annotation (tag: “injury”) InitEndCornerSelector
init
Other annotations on similar documents: (304, 507)
1. General annotation (tag: “skull fracture”) rdfs:SubClassOf
end
(380, 618)
ImageSelector
Slide by Tim Clark
45. C. Metadata: Rhetorical Document Task
Image by Tudor Groza
Call today:
discuss modeling coarse-grained rhetorical structure as
PAM (PRISM Aggregator Message) - a standard format
for transferring XML from publishers to aggregators
(used by Nature.com, and Elsevier in the future)
48. D. Linked Data: E.g. for Elsevier
this says
<ce:section id=#123> mice like cheese
49. D. Linked Data: E.g. for Elsevier
said @anita
on May 31 2010
this says
<ce:section id=#123> mice like cheese
50. D. Linked Data: E.g. for Elsevier
but we all know
she was jetlagged then
said @anita
on May 31 2010
this says
<ce:section id=#123> mice like cheese
51. D. Linked Data: E.g. for Elsevier
immutable, $$, proprietary
but we all know
she was jetlagged then
said @anita
on May 31 2010
this says
<ce:section id=#123> mice like cheese
52. D. Linked Data: E.g. for Elsevier
immutable, $$, proprietary dynamic, personal, task-driven, - open?
but we all know
she was jetlagged then
said @anita
on May 31 2010
this says
<ce:section id=#123> mice like cheese
55. D. What to link? Semantic annotation grid
Granularity
collection
document
claim
triple
entity
56. D. What to link? Semantic annotation grid
Granularity
collection
document
claim
triple
entity Moment
measure author/editor typesetter/production reader/data minin
57. D. What to link? Semantic annotation grid
Granularity
collection
document
claim
triple
entity Moment
measure author/editor typesetter/production reader/data minin
Meansmanual
semi-automated
automated
58. D. What to link? Semantic annotation grid
Granularity
collection
document
claim
Automated Copy Editing
triple
entity Moment
measure author/editor typesetter/production reader/data minin
Meansmanual
semi-automated
automated
59. D. What to link? Semantic annotation grid
Granularity
collection
document
claim
Automated Copy Editing
triple
entity Moment
measure author/editor typesetter/production reader/data minin
Reflect
Meansmanual
semi-automated
automated
60. D. A start: .XMP RDF in Elsevier’s PDFs (DC + PRISM)
65. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
66. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
• Take one paper from his group
67. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
• Take one paper from his group
• And all data that went into making that paper
68. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
• Take one paper from his group
• And all data that went into making that paper
• Including all correspondence, raw data, etc.
69. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
• Take one paper from his group
• And all data that went into making that paper
• Including all correspondence, raw data, etc.
–Challenge: how better to represent that?
70. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
• Take one paper from his group
• And all data that went into making that paper
• Including all correspondence, raw data, etc.
–Challenge: how better to represent that?
• 2010 - 2011: Try to gather resources, current efforts, etc. on
virtual platform
71. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
• Take one paper from his group
• And all data that went into making that paper
• Including all correspondence, raw data, etc.
–Challenge: how better to represent that?
• 2010 - 2011: Try to gather resources, current efforts, etc. on
virtual platform
• August 2011: ‘FoRC: Future of Research Communication’
72. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
• Take one paper from his group
• And all data that went into making that paper
• Including all correspondence, raw data, etc.
–Challenge: how better to represent that?
• 2010 - 2011: Try to gather resources, current efforts, etc. on
virtual platform
• August 2011: ‘FoRC: Future of Research Communication’
–Dagstuhl Workshop
73. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
• Take one paper from his group
• And all data that went into making that paper
• Including all correspondence, raw data, etc.
–Challenge: how better to represent that?
• 2010 - 2011: Try to gather resources, current efforts, etc. on
virtual platform
• August 2011: ‘FoRC: Future of Research Communication’
–Dagstuhl Workshop
–Involve key people (include funding bodies, libraries,
institutions) to see where bottlenecks are
74. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
• Take one paper from his group
• And all data that went into making that paper
• Including all correspondence, raw data, etc.
–Challenge: how better to represent that?
• 2010 - 2011: Try to gather resources, current efforts, etc. on
virtual platform
• August 2011: ‘FoRC: Future of Research Communication’
–Dagstuhl Workshop
–Involve key people (include funding bodies, libraries,
institutions) to see where bottlenecks are
–Write white paper, implement...
75. F. Social Change. Some next Steps:
• Fall 2010: ‘Beyond the PDF’:
–Workshop organized by Phil Bourne @UCSD:
• Take one paper from his group
• And all data that went into making that paper
• Including all correspondence, raw data, etc.
–Challenge: how better to represent that?
• 2010 - 2011: Try to gather resources, current efforts, etc. on
virtual platform
• August 2011: ‘FoRC: Future of Research Communication’
–Dagstuhl Workshop
–Involve key people (include funding bodies, libraries,
institutions) to see where bottlenecks are
–Write white paper, implement...
• Throughout: Start using these tools and writing this way!
77. Interest to collaborate on any of these topics?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
a.dewaard@elsevier.com
78. Interest to collaborate on any of these topics?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
a.dewaard@elsevier.com
79. Interest to collaborate on any of these topics?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights
a.dewaard@elsevier.com
80. Interest to collaborate on any of these topics?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights
D. Semantic/Linked Data XML repositories.
a.dewaard@elsevier.com
81. Interest to collaborate on any of these topics?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights
D. Semantic/Linked Data XML repositories.
E. Publishing systems as application servers
a.dewaard@elsevier.com
82. Interest to collaborate on any of these topics?
A. Workflow tools: Linked-data-based workflow tools for all
sciences: scalable, safe, and user-friendly
B. Authoring and reviewing tools: that enable use of rich
and provenance-tracked elements
C. Metadata standards: Standards that allow exchange of
information on any knowledge item created in a lab,
including provenance/privacy/IPR rights
D. Semantic/Linked Data XML repositories.
E. Publishing systems as application servers
F. Social change: Scientists store, track and annotate their
work.
a.dewaard@elsevier.com